Theaflavin 3-gallate inhibits the main protease (Mpro) of SARS-CoV-2 and reduces its count in vitro

The main protease (Mpro) of SARS-CoV-2 has been recognized as an attractive drug target because of its central role in viral replication. Our previous preliminary molecular docking studies showed that theaflavin 3-gallate (a natural bioactive molecule derived from theaflavin and found in high abundance in black tea) exhibited better docking scores than repurposed drugs (Atazanavir, Darunavir, Lopinavir). In this study, conventional and steered MD-simulations analyses revealed stronger interactions of theaflavin 3-gallate with the active site residues of Mpro than theaflavin and a standard molecule GC373 (a known inhibitor of Mpro and novel broad-spectrum anti-viral agent). Theaflavin 3-gallate inhibited Mpro protein of SARS-CoV-2 with an IC50 value of 18.48 ± 1.29 μM. Treatment of SARS-CoV-2 (Indian/a3i clade/2020 isolate) with 200 μM of theaflavin 3-gallate in vitro using Vero cells and quantifying viral transcripts demonstrated reduction of viral count by 75% (viral particles reduced from Log106.7 to Log106.1). Overall, our findings suggest that theaflavin 3-gallate effectively targets the Mpro thus limiting the replication of the SARS-CoV-2 virus in vitro.

The ongoing COVID-19 pandemic due to SARS-CoV-2 has paralyzed the whole world, motivating the scientific community worldwide to find possible remedies 1 . The COVID-19 outbreak in December 2019 developed into a global pandemic in just a few months, spreading to more than 222 countries, areas, or territories [2][3][4] . The SARS-CoV-2 belongs to the family of enveloped, single-stranded, positive-sense, and very diverse RNA viruses 1 . The genome of SARS-CoV-2 is composed of about 30,000 nucleotides: its replicase gene encodes for two overlapping polyproteins, namely pp1a and pp2ab, required for the replication and transcription of virus [4][5][6] . The polyproteins are proteolytically processed, mainly by the 33.8-kDa main protease (M pro ), also known as 3C-like protease 7 . M pro undergoes autolytic cleavage from polyproteins pp1a and pp1ab, which cleaves the polyprotein at 11 conserved sites 7 . Such functional importance of M pro in the life cycle of the virus, along with the absence of its closely related homologs in human beings, recognize M pro as an attractive target for the anti-viral drug designs 7 . Though several vaccines against SARS-CoV-2 have been developed and granted emergency use authorization by the Food and Drug Administration, there is still uncertainty about their long-term side effects, and no strong scientific data is available regarding the safety of these vaccines for pregnant or breastfeeding women 8 . Afshar et al. 9 reported vaccination against SARS-CoV-2 in patients with co-morbidities and special groups people (e.g., cancer, organ transplant recipients, chronic liver diseases, diabetes mellitus, autoimmune disorders, end-stage renal disease, neurological disorders, chronic obstructive pulmonary disease, HIV, smokers, pregnant women, breastfeeding women, elderly people, children, patients with allergic reactions) need continuous medical monitoring post-immunization since the long-term effects of vaccination in these vulnerable groups are yet to be scientifically evidenced. A recent finding suggests that in the USA, up to one-third of the population is unsure www.nature.com/scientificreports/ of being vaccinated against the SARS-CoV-2 or is sure they will not do so 10 . Moreover, it is common to observe re-infections flaring up in immunized individuals as absolute immunity is not acquired post-immunization from an infection or vaccine [11][12][13][14] . Therefore, potential molecules that can inhibit viral replication and disrupt the interaction between the viral protein and host receptor can add significant value in managing COVID-19 11 . Several anti-viral compounds were screened in-silico in various laboratories worldwide to facilitate rapid drug discoveries. For example, Ghahremanpour et al. identified manidipine, boceprevir, lercanidipine, bedaquiline, and efonidipine as inhibitors of M pro protein of SARS-CoV-2 15 . Similarly, 2-phenyl-1,2-benzoselenazol-3-one class of compounds were reported to inhibit M pro protein of SARS-CoV-2 at nanomolar concentrations 16 . However, these studies still lack sufficient wet lab experimentations following clinical trials to support their claims. Many natural molecules and their derivatives have entered different stages of drug design, including clinical trials against diseases like cardiovascular diseases, malaria, HIV-AIDS, etc. [17][18][19] , and more than 50% of approved drugs are reported to be based on natural products 20 . Therefore, natural molecules have caught the attention of researchers in hunting for anti-SARS-CoV-2 drug candidates, and several natural molecules belonging to different classes and sources of origin have been explored through in silico and experimental investigations 21,22 . Natural molecules such as theaflavin, rutin, curcumin, salvianolic acid, and many flavones were reported to have anti-SARS-CoV-2 activity [21][22][23][24][25] . Among natural molecules, tea polyphenols are well known to exhibit anti-HIV, anti-cancer, anti-oxidative, anti-mutagenic, anti-diabetic, and hypocholesterolemic activities 26,27 . The beneficial effects of green tea, oolong tea, and black tea have been well-known since long ago 28 . Recent studies have also shown the anti-SARS-CoV-2 potential of several bioactive tea molecules. For example, epigallocatechin gallate was shown to disrupt the interaction of spike protein of SARS-CoV-2 with angiotensin-converting enzyme 2 receptors of host cells 29 . Epicatechin-3,5-di-O-gallate, epigallocatechin-3,5-di-O-gallate, and epigallocatechin-3,4-di-O-gallate were identified as inhibitors of RNA-dependent RNA polymerase 30 . Through in-silico studies, our group has previously recognized three lead molecules, oolonghomobisflavan-A, theasinensin-D, and theaflavin-3-O-gallate, as potential inhibitors of M pro protein of SARS-CoV-2 with docking scores higher than repurposed drugs Atazanavir, Darunavir, and Lopinavir 30 . Since the average content of oolonghomobisflavan-A (1.1 mg/100 g) and theasinensin-D (1.8 mg/100 g) as compared to theaflavin 3-gallate (148.6 mg/100 g) in different types of tea is lower, we selected the later for detailed studies 31 . Additionally, oolonghomobisflavan-A and theasinensin-D are poorly soluble in water 32 , while theaflavin and its derivatives are highly soluble 33 . Considering the above-stated reasons and the enormous therapeutic potential of theaflavin 3-gallate 34,35 , we performed wetlab studies to test the inhibition potential of theaflavin 3-gallate along with GC376 (known inhibitors of M pro ) and theaflavin (precursor for theaflavin 3-gallate) against M pro protein of SARS-CoV-2. The cell-based assay was performed to confirm the antiviral activity of theaflavin 3-gallate. Further, in silico studies were also conducted to shed insights into the mechanism of action of theaflavin 3-gallate on M pro .

Results
Theaflavin 3-gallate inhibited M pro protein of SARS-CoV-2. M pro was inhibited by more than 80% after incubation with 100 µM of theaflavin and theaflavin 3-gallate for 30 min. Accordingly, it was tested for inhibition by these molecules at lower concentrations. The IC 50 of theaflavin and theaflavin 3-gallate against M pro protein was calculated to be 22.22 ± 1.4 μM and 18.48 ± 1.29 μM, respectively ( Fig. 1). A known inhibitor of M pro protein named GC376 was kept as a positive control. The IC 50 for GC376 was calculated to be 0.24 ± 0.04 μM ( Fig. 1) which is slightly lower than the reported value of 0.42 μM with the kit.
Theaflavin 3-gallate reduced viral count in vitro in a dose-dependent manner. Incubation of theaflavin and theaflavin 3-gallate with SARS-CoV-2 reduced the viral count to different extents (Fig. 2). It is important to emphasize that the observed decrease in the viral count was done by quantifying viral transcripts under in vitro conditions, i.e., experimental setup using Vero cells, not with a living organism. At lower concentrations of 12.50, 25 µM, theaflavin, and theaflavin 3-gallate could reduce the viral count to below 24%. However, the viral count was reduced by 26% with theaflavin and 42% with theaflavin 3-gallate at 100 µM. At 200 µM, theaflavin and theaflavin 3-gallate showed a reduction of 40% and 75%, respectively. Treatment with theaflavin 3-gallate at 200 µM reduced the viral particles from Log10 6.7 to Log10 6.1 . As anticipated, treatment with remdesivir (positive control) resulted in the inhibition of the virus at a concentration of 0.5, 0.75, and 1.0 µM (Fig. 2).
Theaflavin 3-gallate interacts with the active site residues of M pro . The availability of protein crystal structures allowed us to generate and visualize the interactions between protein and their respective ligands. The optimal poses of GC373 (the active form of the prodrug GC376), theaflavin, and theaflavin 3-gallate with the M pro (PDB ID: 6M0K, resolution 1.50 Å) of SARS-CoV-2 were generated by employing molecular docking methodology. The docking scores in terms of interaction energy for GC373, theaflavin, and theaflavin 3-gallate were 50.54 kcal/mol, 57.41 kcal/ml, and 74.35 kcal/mol, respectively. The binding site on M pro is present between domains I and II, as shown in Fig. 3a. The best poses with the highest docking score were selected, as shown in       Table 2. Among the  www.nature.com/scientificreports/ selected molecules, theaflavin 3-gallate showed the least binding energy (highest affinity) of − 48.02 ± 1.61 kcal/ mol, followed by theaflavin and GC373. The van der Waal energy contributed most significantly to the binding between ligands and M pro of SARS-CoV-2. The non-favorable contributions by the polar solvation energy were highest for theaflavin 3-gallate. However, the large difference in positive contributions by van der Waal and electrostatic energy contributed to the least binding energy of theaflavin 3-gallate compared to GC373 and theaflavin. The final binding energy of theaflavin 3-gallate provided a strong rationale for its high inhibition potential against both the proteins. The per-frame trajectories of binding energies for all the three complexes throughout the simulation are provided in Fig. S5.
A strong external pulling force is required to unbind theaflavin 3-gallate from M pro . SMD simulations can offer the qualitative perspectives of interactions and conformational perturbations between ligand and its neighboring amino acids by inducing ligand unbinding along the MD simulation pathway. In this study, the well equilibrated protein-ligand complexes were taken as starting points for SMD simulations. A pulling simulation was performed on all the complexes with a constant pulling velocity of 0.01 nm/ps and a spring constant of 250 kJ/mol/nm 2 . The representative force profiles of GC373, theaflavin, theaflavin 3-gallate unbinding from binding sites of the M pro are shown in Fig. 5. All the complexes observed a linear increase in the time-dependent external pulling force for the initial phase of SMD simulation. The standard molecule GC373 required the lowest amount of external pulling force to completely unbind from the binding pocket. The peak values of external pulling force experienced by theaflavin and theaflavin 3-gallate were ~ 350.02 kJ/mol/nm and ~ 368.47 kJ/mol/nm, respectively. Afterward, the external pulling force gradually decreased, and the ligand was entirely out of the binding site. The time taken by theaflavin 3-gallate to achieve the maximum pulling force was greater than theaflavin and GC373, suggesting that theaflavin 3-gallate was bound to the active site for a longer duration than both the compounds. These results show that theaflavin 3-gallate required a strong external pulling force to separate it from the binding site of M pro of SARS-CoV-2.

Discussion and conclusions
The emergence of highly pathogenic human viruses, SARS-CoV and MERS belonging to the β-coronavirus group, has gained attention due to their zoonotic transmission and became major pathogenic strains 36,37 . The timely development of anti-viral drugs is of utmost importance. Still, drug development in a short span is exceptionally challenging due to inadequate knowledge of the zoonotic source. Drug repurposing could be an alternative in which the molecules/compounds of known therapeutic effects can be screened and tested for inhibition of viruses, including SARS-CoV-2 38 . Few plant-based compounds offer a rich reservoir for novel anti-viral drug development. They have been reported to possess anti-viral activity against Herpes Simplex Virus type-1, Coronavirus, Influenza Virus, Hepatitis C virus, and Human Immunodeficiency Virus [39][40][41] . Our previous in-silico studies showed that the active molecule theaflavin 3-O-gallate has a strong affinity to the substrate-binding pocket of M pro of SARS-CoV-2 42 . To validate this observation, we quantified the inhibitory potential (IC 50 ) of theaflavin 3-gallate against M pro protein of SARS-CoV-2, which was found to be 18.48 ± 1.29 μM (Fig. 1). Also, incubation of theaflavin 3-gallate with SARS-CoV-2 virus exhibited a 75% reduction of the viral count at a concentration of 200 μM (Fig. 2).
To understand the mechanism of inhibitory action of theaflavin 3-gallate, we performed molecular docking studies, which provide a platform to predict the estimated binding affinity and optimal binding pose between receptor and ligand. The binding poses of theaflavin 3-gallate were compared with a standard drug GC373 and parent molecule theaflavin. Molecular docking analysis showed that theaflavin 3-gallate interacts strongly with the binding sites on M pro with a higher docking score than GC373 and theaflavin. Theaflavin 3-gallate acquired the active site of M pro by interacting with many residues (Asn142, Ser144, His145, His163, and Glu166) (Fig. 3), crucial for dimerization and biological activity. Many experimental and computational studies have shown potent inhibitor molecules interacting with these residues 7,23,24,30,43,44 . The stability of binding poses was validated by different MD-driven time-dependent analyses (Fig. 4). The low deviations in RMSD values for all the three structures (Fig. 4) suggested that the binding of ligands on protein had no impact on the stability of the binding pocket. We also compared the number of H-bonds formed by GC373, theaflavin, and theaflavin 3-gallate with the binding site of M pro throughout the simulation. The analysis of the H-bond profiles of the three molecules showed that theaflavin 3-gallate formed the highest number of bonds during the simulation (Fig. 4d). Protein-ligand conformations at different time intervals were extracted from the MD trajectories to get an in-depth insight into the molecular interactions during the simulations. Among the three compounds, theaflavin 3-gallate formed exclusive H-bonds with residues His41, Val186, and His164 during the simulation, whereas these interactions were absent in theaflavin and GC373 (Fig. 4b,c). www.nature.com/scientificreports/  www.nature.com/scientificreports/ Theaflavin 3-gallate strongly interacts with Cys145 residue involved in the formation of the catalytic dyad of the active site of M pro7 compared to theaflavin and GC373. The strong interactions shown by theaflavin 3-gallate with M pro were further validated and compared with GC373 and theaflavin by calculating the binding free energy by the MMPBSA method, which is an efficient and reliable method for evaluating protein-ligand interactions 45,46 . The binding free energy results confirmed that theaflavin 3-gallate showed the highest binding affinity for M pro than GC373 and theaflavin (Table 2). Our results showed that the van der Waal energy contributed most favorably to the binding of theaflavin 3-gallate with M pro (Table 2). Moreover, we also performed SMD simulations to analyze the amount of external force required for unbinding GC373, theaflavin, and theaflavin 3-gallate from the binding pocket. Our SMD results demonstrated that theaflavin 3-gallate required the highest amount of external force to unbind it from the binding pocket of the M pro of SARS-CoV-2 (Fig. 5). Also, theaflavin 3-gallate remained at the active site for longer than the GC373 and theaflavin during the pulling simulations (Fig. 5), suggesting strong interactions with the residues of the binding site.
In conclusion, theaflavin 3-gallate exhibited a strong binding affinity with M pro as evidenced by in-silico, and enzyme-based inhibition studies and reduced SARS-CoV-2 virus count as evident from cell-based experimental results. The overall results suggest the potential use of theaflavin 3-gallate against SARS-CoV-2. Theaflavin 3-gallate is a major component of black tea which is already known for its anti-oxidant properties and is the most consumed beverage in the world. Since theaflavin 3-gallate is already consumed by humans through tea and crosses cell-cell barriers in the body 47 , it is an excellent potential inhibitor against SARS-CoV-2. Either the molecule alone or in formulations with other such anti-viral compounds as a cocktail can provide an effective first line of defense against diseases associated with coronaviruses.

Methods
Theaflavin (molecular weight = 564.49 g/mol) and theaflavin 3-gallate (Molecular weight = 716.61 g/mol) were procured from (PhytoLab GmbH & Co. KG, Germany). The authenticity and purity of the molecules were validated by mass spectrometry analysis using UHPLC-IM-QTOF 6560 instrument (Agilent, USA) equipped with a PDA detector and hyphenated to the Q-TOF MS/MS (Fig. S6). Stock solutions (5 mM) of theaflavin and theaflavin 3-gallate were prepared in methanol. Appropriate dilutions were made in assay buffer to test the molecules at various concentrations keeping methanol concentration less than 1% in the final reaction volume.
All the assays were carried out at room temperature (RT: 25 ± 2 °C) in triplicates. As an initial screen, theaflavin and theaflavin 3-gallate were tested for inhibition of M pro protein at 100 µM concentration and later tested at lower concentrations to calculate the IC 50 value. Post-infection, the media was replaced with DMEM media along with 10% FBS-containing drugs and maintained in an incubator at 37 °C, 5% CO 2 until 72 h. Post 72 h, the cell supernatants were collected to enumerate the cell-released viral RNA particles using quantitative real-time PCR. Viral RNA extraction was performed via the Viral/Pathogen Extraction Kit (Applied Biosystems, Thermo Fisher Scientific) according to the protocol described by the manufacturer. Viral supernatants (200µL) after centrifugation from the experimental groups were aliquoted into deep well plates and were subjected to the lysis buffer that contained 260 µL, binding solution; 10 µL, binding beads; 5 µL Proteinase K (w.r.t. to mentioned sample volume) from the extraction kit. The RNA extraction step was performed using the Kingfisher Flex (version Molecular docking. We used the resolved crystal structure (resolution 1.50 Å) of M pro (PDB ID: 6M0K) 38 protein of SARS-CoV-2 for a molecular docking study through the CDOCKER protocol of Discovery Studio 48 . The structure of theaflavin and theaflavin 3-gallate were downloaded in SDF format from PubChem (CID: 169167) 49 . The Gaussian protocol with density function theory was used for ligand geometry and energy minimization 50 . The binding site for M pro was defined by taking the reference of the co-crystallized inhibitor (11b), and the one with the best docking score was reported. The sphere coordinates for the M pro binding pocket were 11.62, 11.88, and 68.70, with a radius of 12.00 Å. The other parameters were kept as default. The top five binding poses with the highest CDOCKER energy were reported of all the binding conformations predicted.

Molecular dynamics (MD) simulations and thermodynamic free energy calculations. The best
binding poses of theaflavin 3-gallate with M pro were subjected to MD simulations by GROMACS package [51][52][53] . We used the CHARMM36 force field to generate ligand topologies by CGenFF server 54,55 . Similarly, the protein topologies were prepared by the CHARMM36 force field by employing the "gmx pdb2gmx" script of GROMACS. The protein-ligand complexes for simulations were prepared by appending ligand topologies to the protein topologies. To maintain an overall neutral charge of the system, sodium and chloride ions were added by employing the "gmx genion" script. Further steps of MD simulations were carried out by following the protocol defined in our previous studies 42,56,57 . The energy minimization of the simulated system was performed using the steepest descending algorithm for about 50,000 steps. After that, the system was subjected to the equilibration step using NVT for 1000 ps, followed by NPT ensembles for 1000 ps. Parrinello-Rahman pressure and Berendsen temperature controller systems were employed throughout the equilibration to maintain a constant pressure of 1 bar and temperature of 300 K, respectively. Finally, an MD production run of about 100 ns was employed without restrictions. The most frequently used particle mesh Ewald (PME) method was taken into account for calculating long-range electrostatic interactions. The Lennard-Jones potential with a cutoff value of 1 nm was utilized to calculate short-range van der Waals interactions. At the same time, Linear Constraint Algorithm (LINCS) was employed to constrain all the covalent bond lengths, including the hydrogen bond. The simulation trajectories of both the complexes were used to determine the root mean square deviations (RMSD) of backbone C-α atoms, extracting binding poses at different time intervals, and calculating thermodynamic binding free energies. We utilized the Molecular Mechanics Poisson-Boltzmann Surface Area (MMPBSA) method to calculate the free energies of protein-ligand binding. MM-PBSA is a classical and validated method for modeling protein-ligand interactions 46 .
Steered molecular dynamics (SMD) simulations. SMD allows analyzing the amount of external force required to unbind ligand from its binding site in protein 58 . One of the most extensively used GROMACS packages was employed to perform SMD simulations. The end terminals of proteins were processed for preparing protein topologies, while the same ligand topologies were used in conventional MD simulations. Subsequently, protein-ligand complexes were solvated in a rectangular box of dimensions (8.5 × 8.3 × 25) Å 3 . The steepest descent method was employed for the energy minimization of protein-ligand complexes. Further, an NPT run was carried out at 100 ps to equilibrate the energy-minimized complexes. For the implementation of the pull code, a spring constant of 250 kJ/mol/nm 2 and a constant velocity of 0.01 nm/ps was maintained. For the execution of the pulling code, a spring constant of 250 kJ/mol/nm 2 and a constant velocity of 0.01 nm/ps was maintained. During pulling simulations, the following equation was utilized to calculate the external force: where F = external pulling force; k = spring constant; v = constant velocity; X pull (t) = position of atom at time t and X pull (0) = initial position.

Data availability
The datasets used and/or analysed during the current study available from the corresponding author on reasonable request. www.nature.com/scientificreports/